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ABSTRACT 



This document describes how the main principles of 
Perspective Text Analyses are implemented in the PC-system PERTEX, 
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mentality that the text presents. The text material is processed in 
the following main steps: (1) coding of function words by means of a 
special dictionary; (2) design and coding of blocks according to the 
AaO (Agent-Verb-Objective) paradigm; (3) supplementation of A- and 
0-dummies; (4) generation of A/0 matrices; (5) cluster analysis based 
on generated matrices; and (6) topological presentation cf outcomes. 
PERTEX gives an integration of all the steps in the analysis, and the 
user is offered numerous comprehensive functions for automatic coding 
and control of syntax. By a multilingual design, PERTEX can operate 
on texts in different languages. The user can select different 
menu~languages for the interaction with PERTEX. The technical output 
of the system is illustrated in the appendix with the complete 
17-page printout from analysis of a classic text. (Author/SLD) 
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Abstract 

The principles for Perspective Text Analysis has been implemented into the PC-system 
PERTEX, The analysis starts from normal text and ends up in a topological representation of 
the mentality the text presents. The text material is processed by the following main steps: (1) 
Coding of function words by means of a special dictionary, (2) design and coding of blocks 
according to the AaO-paradigm, (3) supplementation of A- and 0-dummies, (4) generation of 
A/O-matrixes, (5) cluster analysis based on generated matrixes, (6) topological presentation 
of outcomes. PERTEX gives an integration of all the steps in the analysis and the user is 
offered a lot of comprehensive functions for automatic coding and control of syntax. By a 
multilingual design PERTEX can operate on texts in different languages. The user can select 
different menu-languages for the interaction widi PERTEX. 
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This paper describes how the main principles of Perspective Text Analysis (PTA), Bierschenk 
& Bierschenk (1986 a, b, c) are implemented in the PC-system PERTEX. The description is 
concentrated on the main steps in the analysis without technical details of system design and 
programming. Each step in the analysis is first described i general terms and then partly 
illustrated by small examples. The complete printer output from the analysis of a text is 
enclosed in the appendix. The example in appendix will be used as a basis for illustrations of 
different aspects in the analysis. 

The paper does not present the great number of screen layouts and parameter options for the 
integrated handling of all the steps in the analysis. As user of PERTEX you have control of 
all the steps in the analysis and you can stop and restart the process at different stages. 

PERTEX is built to realize the following main steps of PTA: 

(1) Coding of function words for verbs, prepositions, sentence openers, clause openers. 
PERTEX has a specially developed dictionary and language dependent routines for 
identification of function words in the text. 

(2) Design and coding of blocks according to the AaO-paradigm. A block is based on a verb, 
the a-component in AaO. The block also consists of an agent, the A-component, and an 
Objective, the O-component. The O-component is differentiated by prepositions used in 
the text. The limits of a block are in the general case set by sentence- or clause openers, 
e. g. full stop or comma. 

(3) Supplementation of A- and 0-dummies. In normal text variables for the A- and O- 
components are sometimes omitted. All the implicit references to and from A- and O- 
components are made explicit in this step. Even different forms of self references arc 
handled by PERTEX. 

(4) Generation of AlO -matrixes. The block oriented connection between A and O is a 
comer-stone in PTA. All such connections between the unique A- and O-components in 
the text are organized in different binary A/O-matrixes. 

(5) Cluster analysis based on AlO -matrixes. Ward's method for clustering is used i PERTEX. 
By clustering the 0-rows in a A/O-matrix, with the A's as variables, we extract tiie 
structural relations of the Objective in the text. When, in the transposed matrix, tiie A- 
rows are clustered, with the O's as variables, we extract the structural relations of the text 
producer's perspective on the Objective. 

(6) Topological presentations of outcomes, The user of PERTEX has to select the significant 
number of clusters in every cluster analysis. Here PERTEX offers not only Ward*s ESS- 
values but also different t-tests. The number of selected clusters and the text content of 
every cluster are presented for the user's naming of the clusters. The clusters ai^e then 
organized according to the cluster tree and the user can fulfil die investigation of the text 
by following the synthesis from the clusters to the root of the cluster tree. 

This short exposd of PTA via PERTEX is only meant as a preliminary frame. Now every step 
will be presented with more details and illustrative examples. First, however, some comments 
on import and export of normal text to and from PERTEX. 
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Normal text to and from PERTEX 



The input text to Perspective Text Analysis is normal text produced in anj^ purposeful 
context. Such a text can be imported to PERTEX from a text file on diskette. It is also 
possible to write/edit the text by using PERTEX's text editor. A text imported from a text file 
can also be edited by this editor. A normal text in PERTEX can be exported to an external 
text file for use in other systems, e. g. a system for word processing. 

The handling of normal text in PERTEX is illustrated on page A3 in appendix. The text has 
running numbers per row. This numbers can be used by the text editor in searching specific 
parts of tlie text. Numbers for reference to rows are also used in coding and supplementation, 
see page A4-A8. In the automatic functions for control of syntax these numbers aie used for 
indication of where errors are located. 

Under the head "Label", see page A3, the user can insert labels indicating the start of a 
specific section of the text. By labels it is possible, later on in the process, to select only some 
part or parts of the text as basis for an analysis. If, for example, a text reports a discussion 
between two or more people labels can by used to select the text from one of the speakers. 
The use of labels is optional. If no label is used the entire text is used in the analysis. 

In different phases of the analysis PERTEX use different formats in handling the text. The 
format can be strings of single words, blocks. Agents and Objectives. Starting from normal 
text on a file the text is first made up for dictionary coding. In this format, see page A4, every 
single word, full stop, comma, colon and semi-colon is placed separately on one row. In this 
formatting of the text PERTEX uses special rules for English. Short forms like it's and I'm are 
transformed into the ordinary fornis it is and / am. These transformations are necessary 
because PERTEX must operate on single words in coding the text. 



Dictionary coding 



^rht dictionary coding is the most language dependent phase of the analysis. Strings for single 
words shall be coded as verbs, prepositions, sentence openers or clause openers. For every 
language there is a unique dictionary. At the moment PERTEX is practically operative on 
English and Swedish texts. The English dictionary has about 6.000 items and the Swedish 
about 3.000 items. For German, French, Danish, Norwegian, Finnish and Latin PERTEX has 
dictionary "embryos" . It is also possible to use PERTEX on a text without specification of 
any language. In that case PERTEX is only used as an administrative instrument with control 
functions and the user has to do all the coding manually. 

PERTEX use language dependent dictionaries for coding single rows in the text formatted 
according to page A4. The codes used are as follows: 



00 sentence opener 

01 clause opener 

40 active verb 

40P passive verb 

4? uncertain verb 



60 preposition for gi-ound 

6? uncertain preposition for ground 

70 preposition for mean 

7? uncertain preposition for mean 

80 preposition for goal 

8? uncertain preposition for goal 
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These codes are placed by PERTEX in the Code column, see page A5. In this automatic 
coding each row (one word) is matched against the dictionary. If the text string on the row is 
found in the dictionary the corresponding code is set at the row. Verbs can be identified 
irrespective of the conjugation of verbs in the text. The dictionary contains the stem of the 
verb plus codes for the conjugation. For English texts PERTEX operates with six different 
patterns for conjugation of verbs. Each pattern represent four different forms of the verb, e. g. 
walk, walks, walked, walking. Verbs that have irregular paradigms are also included in the 
dictionary. 

For Swedish texts PERTEX operates with ten separate patterns for conjugation of verbs. Here 
each pattern includes 12-14 different forms for the verb. In Swedish a verb is often combined 
with a lot of different prefixes. All this combinations as separate verbs would have resulted in 
a dictionary too large to be handled. Instead PERTEX can combine about 50 Swedish 
prefixes with the items in the Swedish dictionary. 

Dependent on the situation, specially in English, a specific word is a verb or not a verb, a 
preposition or not a preposition. When we read a text it is obvious which words are verbs and 
prepositions. This flexibility in language can not be managed automatically by simple' 
matching of text strings. After a lot of experiments with language dependent rules for 
identification of verbs and prepositions PERTEX can eliminate most of the 4?, 6?, 7? and 8? 
codes generated from the matching of text to dictionary. At the moment PERTEX has 22 
such rules for handling the primary result from dictionary coding of English texts. A short 
example illustrates this important stage of the automatic coding of text. 

A sentence like / want to go to the town by bike to buy a present for my friend's birthday, has 
five words that by matching against the dictionary are coded 4? or 6?. This preliminary result 
is processed by special mles in PERTEX so that the final result from dictionary coding of the 
sentence will not include any 6? or 4? code. 



Only matching Final result from 

to dictionary dictionary coding 
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If the dictionary coding ends up with any ?-code left, PERTEX*s control of syntax will 
indicate where these ?-codes are. The user of PERTEX then has to decide how this v/ords 
shall be coded. Without elimination of all the ?-codes it is not possible to continue the 
analysis to the next stage. The text in appendix, page A5, is automatically coded witiiout any 
?-code. 

PERTEX has an optional function for logging all corrections the user does of the coding 
automatically produced in dictionary coding. This code journal can be used for improvement 
in the dictionary and the routines for dictionary coding. 



Coding of blocks 

In this phase the whole text is fomiatted and coded in blocks. According to the AaO- 
paradigm a block consists of three types of components: Agent (A), action (a) and Objective 
(O). The a-components are already coded from the dictionary coding. The simple main rule 
for creating a block says that the text strings before the a-component are die Agent and the 
text strings after the a-component are the Objective. In practice it is, however, a complicated 
process to create the blocks from the dictionary coding of the text. First the start row and stop 
row for every block must be identified. Then the following codes are used to give a code to 
every row, see page A6. 

01 Boundary for blocks 

30 Agent (A) 

40 action (a), active verb 

40P action (a), passive verb 

50 Objective (O), Figure 

60 Objective (O), Ground 

70 Objective (O), Mean 

80 Objective (O), Goal 

Bierschenk & Bierschenk (1986 b) present a set of 18 process rules for coding of blocks. This 
rules are used and to some extent transformed into a transaction design in PERTEX. This 
means that the overall frame of reference, naturally present when using the original rules 
manually, is replaced by a more computer oriented design of strict transactions based on 
certain combinations of codes and text strings. 

In PERTEX the block coding is processed in nine separate steps operating with 46 different 
transactions. A -simple example on such a transaction can be illustrated by the following 
situation. A row with a 40-code has a succeeding row not yet coded. The code for that 
succeeding row is then set to 50. Another transaction is used when a 50-row is followed by a 
row not yet coded. The code for that following row will be set to 50. From tiiese two 
examples of transactions we see that it is necessary to organize die use of transactions in a 
purposeful way. By the design of block coding in the nine separate steps this requirement is 
fulfilled. The transaction design of block coding has proved to be a practical and flexible 
design for the development of new and language specific rules for computerized block 
coding. 
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The whole process of block coding is too comprehensive to be reported here in detail. The 
sentence used for illustration of dictionary coding can be used here as well, to give an 
example of some of the main principles for block coding. 



Final result from 
dictionary coding 



Final result from 
block coding 
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(Start block 1) 



(Start block 2) 



(Start block 3) 



The boundaries for the three blocks are the 01 -codes from dictionary coding. The first block 
is based on the verb want. The text before want is coded 30 as an A-component. After want 
there is originally no text before the boundary between block 1 and block 2. As a blockr'per 
definition must have an Objective we insert a dummy Objective indicated by and coded by 
50. 

In block 2 there is originally no text before the a-component go. As a block must have an 
Agent we insert a dummy Agent indicated by and coded by 30. The Objective in block 2 is 
specified by 60-code from to and 70-code from by. Strings following a preposition are 
generally coded by the code from the preposition. 

In block 3 another dummy for the Agent is inserted. After the verb buy the Figure code 50 is 
used for a present and the 80-code from for gives the codes for the remaining words in the 
block. 

The sentence / want to go to the town by bike to buy a present for my friend's birthday. 
generates three blocks based on one verb in each block. If explicit Agents and Objectives are 
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omitted in a block dummies are inserted. Here it is important to notice that it is the functional 
position in the block, and not any classification of words, that governs the identification of 
Agents and Objectives. This means that a single word can be an Agent in one block and an 
Objective in another block. 

This little example gives only an indication of some of the details in PERTEX's automatic 
block coding. The appendix, page A5 and A6, demonstrate some other details. First we can 
notice that a 01-code not always is a boundary for blocks, see block 1 on row 2-29. If codes 
for prepositions are located in text strings for an A.gent, the 30-code eliminates former 60-, 
70- and 80-codes, see block 1. This is an example of substitution of codes during the process 
of block coding. 

On page A6 row 141, 70 *' is inserted as an 0-dummy. This dummy is initiated by 70 with' 
on row 140 as the last word in that block. This 'with' refers to something that will be made 
explicit in the supplementation phase. The same situation can be found for a *60 *' on row 
197. 

On page A5, row 109 and 1 10 are both coded 01 from dictionary coding. Tow such 01 -codes 
close to each other, are interpreted as a boundary for a sentence and the first one of the 01- 
codes is replaced by a 00-code, see row 128 on page A6. 

If the first block in a sentence has a word as clause opener and first boundary, and this word 
is just before the verb, the general X-Agent is inserted. This Agent is also differentiated by 
the clause opener, see rows 64-65 on page A5 and rows 69-72 on page A6. 

Passive verbs, coded as 40P, can be found on page A6 row 59, 88, 118 and 169. In block 
coding for an English text the passive form is identified by the verb be followed by another 
verb. Tliis two verbs are transformed to one row, e. g. row 59 bejdriven. The identification 
of Agent and Objective in a block with a passive verb is handled Tn a special way, see page 
A6. In Swedish the passive form is normally indicated via the s-form of the verb. Such 
passive verbs are coded 40P already in dictionary coding. 

The system has a function for strict control of the block syntax. In case of any syntax error 
the process can not continue to supplementation before the user has eliminated all enws. 
After a lot of experiments with different texts and automatic block coding, PERTEX is now 
operating without syntax errors in this complicated phase of the analysis. The interactive 
design of PERTEX makes it possible for the user to manipulate all the codes, and even do all 
the coding manually. 



Supplementation 

In this phase of the analysis all the A- and O-dummies will be supplemented by explicit text 
strings. The main rules in PTA say that an A-dummy is supplemented by the Agent in the 
preceding block and an O-dummy is supplemented by the Agent and the Objective in the 
succeeding block. If a block has the Agent 'it', then that Agent is supplemented by the Agent 
and the Objective from the preceding block. This main rules seems very simple. But in 
practice the supplementation is very complicated because of chained relations between 
several blocks and self references between dummies. 
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In Bierschenk & Bierschenk (1986 b) a set of 18 rules stipulates how to handle the 
supplenfientation. These rules operate with reference numbers for blocks. With chained 
relations between blocks these reference numbers must be handled in complex chains with 
numbers embedded at several levels. These rules are not copied directly to a program in 
PERTEX. By transformation of the rules to a strict transaction design, cf. the design of block 
coding, the supplementation in PERTEX can be done without chains of embedded reference 
numbers. Instead PERTEX operates in an iterative way by using a set of circa 30 transactions 
for specific combinations of codes and text strings. The number of iterations depends on how 
complicated the text is according to A- and 0-dummies and the relations between the 
dummies. 



In supplementation it is no longer strings for single words that are handled. It is now the 
entire Agent, 50-Objective, 60-Objective, 70-Objective and 80-Objective that are 
manipulated. Because of this PERTEX starts the supplementation by making up the text from 
block coding into a format for supplementation, see page A7. This Gansformation of the 
sentence used for demonstration of block coding, and the supplementation of that sentence, 
are as follows: 
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In this little example a supplementation is first done for the A-dummy (30 *) in block 2. The 
Agent T from block 1 is used as Afient in block 2 instead of '*'. Then PERTEX continues 
with the A-dummy in block 3, and this dummy is replaced with the Agent from block 2, 
which now is T as a result from the preceding supplementation transaction. All three blocks 
have got the same agent T. We also see that it was necessary to supplement the Agent in 
block 2 before the Agent in block 3. The supplementation of an Agent requires an explicit 
Agent in the preceding block. As illustrated in this little example the supplementation of 
Agents runs from the first to the last block of the text. 
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The supplementation of 0-dummies runs from the last to the first block. In the example 
above there is only one 0-dummy, see block 1. After the supplementation of all the A- 
dummies this 0-dummy is supplemented by the Agent and the Objective from block 2. 

After one Agent-run from top to bottom and one Objective-run from bottom to top all the 
dummies in the example are eliminated. For a real text this is usually not the case. Even after 
an iterative use of Agent-runs and Objective-runs, as illustrated above, some dummies remain 
and can not be supplemented in this way. The reason can be that the Agent in block n is of 
type 'it". This Agent requires an explicit Agent and an explicit Objective in block «-/. If the 
Objective in block «-/ is an 0-dummy the situation indicates a simple self reference between 
A- and 0-dummies. The O-dummy in block n-l requires an explicit Agent in block n. But the 
agent in block n is the 'it'-dummy we started with. PERTEX breaks this form of reference 
loop when the supplementation can not continue according to the main rules illustrated above. 
The Agent in block n-l and the Objective in block n will be the supplement for the O- 
dummy in block n-l. After that '30 it' in block n can be supplemented without problem. Self 
references between A- and 0-dummies can be more complicated and involve several blocks. 

Different forms of more sophisticated supplementation indicates the need for an iterative 
process in supplementation. PERTEX works with an iterative change between 
supplementation of A-dummies and O-dummies in as many runs as needed for elimination of 
all dummies. The system always brings diis iterative process to a normal stop with all the 
dummies supplemented. The number of iterations reported on screen also include 
administrative iterations in supplementation. When text strings are combined for new Agents 
and Objectives the result will sometimes end up with the same text string duplicated in an 
Agent or an Objective. All such copies of text are automatically eliminated. 

For the text example in appendix we have five '30 *', one '30 it', two '50 *' one '60 *' and one 
'70 ten dummies in all, see page A7. The result of the supplementation can be found on 
page A8. This text does not illustrate any particular complications in supplementation. 



Generation of A/0 matrixes 

In Perspective Text Analysis the block oriented relation between Agent and Objective is a 
corner-stone. PERTEX also calculates how many unique Agent/Objective combinations there 
are in the text. These A/O-combinations are found in the supplemented version of the text and 
presented in binary A/0 matrixes. 

Here it is important to notice that the generation of A/O-matrixes is not based on frequencies 
for the Agents and Objectives. As the analysis is built on affinity, and not on frequencies, it is 
the number of unique Agents and Objects that are of interest. It is the block-wise 
combinations of these unique Agents and Objectives that are organized in die binary matrixes. 
PERTEX produces four types of such matrixes. 



Matrix 



Represents 



50/30 
60/30 
70/30 
80/30 



Mean 
Goal 



Figure 
Ground 
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As indicated by the matrix type, it is the four different aspects of the Objective component 
that are used for a separation into four different types of A/O-matrixes. 

The same (only one) Agent in a block is used in all the four possible A/O-combinations in 
one block. By definition a block has minimum one and maximum four types of A/O- 
combinations. The number and kind of A/O-combinations and A/O-matrixes is only 
dependent on the text and can not be governed from outside the text. 

The sentence used earlier for illustration of coding has three blocks but only one unique 
Agent, T. This means that all matrixes generated from the supplementation of the sentence 
will be of vector type. With the two unique 50-Objectives (/ to the town by bike from block 1 
and a present from block 3) the binary matrix of type 50/30 has the dimension 2x1 and has a 
binary '1' in each of the (two) cells. As can be seen from supplementation, tiie 60/30, 70/30 
and 80/30 matrixes will have the dimension 1x1 with one binary T each. This little example, 
primarily used for some basic illustrations of coding, is too small for any realistic 
demonstration of A/O-matrixes. Therefore we now leave that sentence. 

In appendix, page A9, is reported the 17 unique 50-Objectives and the 11 unique Agents that 
are combined in the blocks from supplementation of the text. The unique text strings are 
marked by simple numbers for identification. These numbers are created successively when 
PERTEX is processing the text from the first to the last block. All the unique 50-Objectives 
are reported but only the unique Agents from blocks which have a 50-Objective are reported. 

The 17 unique 50-Objectives and the 11 unique Agents define a 50/30-matrix of size 17x11. 
On page A9 this matrix is technically described in three different ways. The Agent 
coordinates (column index) are given for every 50-Objective. The same matrix is also 
described by the 50-Objective coordinates (row index) for every Agent. Finally the matrix is 
also reproduced in explicit form with a binary 'V according to the coordinates. Here it is 
important to notice that a binary '1' only indicates that the combination of the corresponding 
Agent and 50-Objective is found in at least one block. The matrix does not give any 
indication of how many times this combination is used in the text. As said before, it is not 
frequencies of A/O-combinations but affinity for the unique A/O-combinations that are in 
focus for the analysis. This focus will be dealt with in the forthcoming cluster analysis based 
on the binary matrixes. 

On page A 10 is reported the unique text brings and coordinates for the 60/30-matrix. Here 
we see that all the unique text strings indicating Ground begin with a preposition coded 60 in 
dictionary coding. Among the unique Agents we find for example that the first one, the style 
of their ships, is not an-Agent combined with 50-Objectives, see page A9. In the second block 
of the text, see page A8 row 8-18, we find the basis for this situation. In block 3, page 8 rows 
13-17 we see how the Agent there X is combined with both a 50-Objective and a 60- 
Objective. 

On page All is reported some statistics for the text and the matrixes. The TEXT 
STATISTICS are for the present in focus at the front line in the scientific interpretation of 
results from PTA and will not be discussed in this paper. In the section for MATRIX 
STATISTICS we find a short summary of the generation of the matrixes discussed here. 
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The 50/30-matrix is based on 17 (81%) of all the blocks in the text. The dimension 17/11 has 
been discussed earlier. The measure for Density is a standardized value (0.0-1.0) indicating 
the proportion of binary T in the matrix. Density =0.0 means that there is no extra binary T 
in the matrix. Density^l.O means that all unique Objectives are combined with all the Unique 
Agents. A Density of 1.0 is not expected to be found for real texts. For the 50/30-matrix the 
Density is set to 0.0 because there are only 17 cells marked *!' in the matrix. This is the 
minimum number of unique combinations marked 'T in a matrix with maximum number of 
rows or columns equal 17. The interpretation of the Density value is still under investigation. 
Tests on different texts indicate that the structural relations are more complex (see cluster 
analysis and sub-trees discussed in the next section) for matrixes with relatively high Density 
value. 

On page All is reported a 70/30-matrix with dimension 4/4 and the comment Diagonal, no 
structure. This means that the unique combinations between Agents and 70-Objectives 
generates a 4x4 matrix with four binary T placed on the diagonal. A cluster analysis based on 
such a matrix is not meaningful because the clustering process is here completely arbitrary. 
For all such matrixes PERTEX signals this type of comment to the user. It is technically 
possible to fulfil an analysis based on a matrix with binary T only in the diagonal, but it will 
be done on the users responsibility. The 70/30-matrix from this text is not handled any further 
here. 

The fourth type of matrixes, 80/30, is reported on page All as a 1x1 matrix, which, as the 
comment says, is No matrix. 

The matrixes generated from the text discussed here represent a general pattern found for 
many texts. The 50/30-matrix, for Figure, is the largest matrix. Then the matrixes for Ground, 
Means and Goal are decreasing in size, or perhaps not present at all This observations are 
only technical remarks and do not say anything about the final results of the analysis. 



Cluster analysis based on A/O-matrixes 

Tht binary matrixes discussed in the previous section are used for cluster analysis. First the 
Agents are interpreted as variables and the 50/30-, 60/30-, 70/30- and 80/30-matrixes are set 
up as conventional data matrixes with the dimension n x p. By clustering these types of 
matrixes we expose the structural relations for the Objectives in the text. When clustering the 
transpose of these matrixes we expose the text producers perspective on the Objectives. 

Ward's method, Ward(1963), is used for clustering. This is a robust and well-known method 
which generates-valid^d interesting results in PERTEX. In Ward's method the ESS-value 
(Error Sum of Squares) is used as the clustering criteria. The ESS-value for one variable and 
n items is calculated by the formula: 



n 



n 
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For a binary variable the ESS-formula can be transformed to: 
n 

where NB j is the number of binary T- values for the variable and NBq is the number of binary 
'O'-values for the same variable. By definition NBj + NBq = n. The total ESS-value for a 
binary data matrix is calculated by accumulation of the ESSg-values for all the variables. 

By using the ESSB-formula in PERTEX the calculation of ESS-values can be done in a very 
effective way. The simple idea involved is the fact that all values in a binary data matrix are 
T or '0'. The ESS-value for a cluster of m items and p variables is in the general case based 
on m X p values. With binary variables the calculation of ESS3 for such a cluster can be based 
on only p values. Each of these p values marks the number of binary '1' in the cluster for that 
variable. 

The clustering process in PERTCX is also built according to the special form of the binary 
data matrixes used in this phase of the analysis of the text. As illustrated in appendix, page 
A9, the binary matrix has significant more 'O'-elements than T-elements. Therefore it is 
effective to base the calculation of ESSg-values on the T-elements. If we know the 
coordinates for all the 'I'-elements, then we also know the whole matrix. This specific 
character of the binary A/O-matrix is used in PERTEX to optimize the comprehensive 
searching for the next fusion in the clustering process and for updating a similarity matrix 
used in that process. 

Despite the special things made in the design of the clustering process, the results pioduced in 
PERTEX is exactly according to Ward's method. The overall purpose for the specific 
implementation of Ward's method in PERTEX is the optimization of calculation to minimize 
the need for computer time. After a lot of improvements the algorithm can now handle rather 
big matrixes in acceptable amount of time. A binary matrix of dimension 132 x 74 is for 
example clustered in 48 seconds on a 386-PC. 

The result from a cluster analysis of the 50/30-matrix in appendix is reported on page A 12. 
The whole clustering process going from 17 to 2 clusters is documented with fused rows and 
ESS-value for every step. The ESS-values are reported both as the increase of ESS in every 
step and as the accumulated ESS so far in the process. 

A decision of number of significant clusters can be based on a cut-off point in the ESS- 
values. To improve the support for the users decision of number of significant clusters t-tests 
can be used. The ESS-values are tested in one-side t-tests, both for the step by step values and 
the accumulated total values. Two tests with different degrees of freedom can be ordered. The 
test of step by step values and DF=(number of items - 2) is the same testing as proposed in 
Wishart(1987). The test of step of step values with DF=(prcsent number of clusters - 1) has 
been proposed in Bierschenk & Bierschenk (1986 c) as a more conservative way of testing 
ESS-values in connection with Perspective Text Analysis. Practical experience so far 
indicates that the second way of testing seems to be a robust way to transform the cut-off 
point for ESS-values to standardized value of significance. The corresponding t-tests for the 
total ESS-values are under observation and the relevance of these tests will be evaluated on 
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the basis of future experience. The different t-tests are optional and can easily be ordered by 
the user. An automatic cut-off for useless values of significance greater than 0.5 saves some 
computer time. 

In the example reported on page A12 both the cut-off point for the ESS-values step by step 
and the corresponding t~tests indicate that seven clusters seem to be the greatest number of 
significant clusters in this example. The cluster tree can be printed with the dynamic scale 
based on the step by step ESS-values or the total ESS-values. When a specific number of 
clusters are selected a dotted line indicates the cut-off in the tree. 

A hierarchical cluster tree can be organized and presented in many different ways. For a tree 
with n items clustered, the orientation of the branches at the nodes can be combined in 2<""i) 
different ways. All the cluster trees produced i PERTEX are organized and printed according 
to the following two rules. The rules are expressed under the assumption of a horizontal 
representation of the tree, see the trees on page A 12. 

- If the number of clustered items are not equal on both the branches at a node, that 
branch with the greatest number of items will be oriented upwards. 

- If the number of clustered items are equal on both branches at a node, that branch 
with the lowest item number. Row no, will be oriented upwards. 

The items clustered, here unique text strings for 50-Objectives, are indicated by their Row no 
to the left on page A 12. Row no refer to rows in the 50/30-matrix. 

These two rules for design of the cluster tree are very imponant and the technical key for the 
interpretation of the clustering as a result of a process and not only as a hierarchical 
organization of cenain items. More about this will be outlined in the next section. 

With this design of the cluster tree we normally get a special cluster at the top. This top 
cluster is created at the lowest ESS-value according to the cut-off line and consists of 
'residual' items not fused to other clusters because of dissimilarity. The top cluster for 
CLUSTER TREE 5030 on page A12 consists of the items 1,2,7,12 and 15. The 50/30-matrix 
on page A9 shows that just these five items are the only items, of totally seventeen items, that 
have no binary 'I'-connection to any other item via a common Agent. Item 5 and item 10, for 
example, have such a connection via Agent 4. In the cluster tree on page A 12 we also find 
that item 5 and item 10 create a cluster, the third cluster from the top. The clusters are 
identified by number of sequence from top to bottom along the cui-off line. This numbers are 
used as reference numbers in the next section. 

So far we have discussed clustering based on the 50/30-matrix. This analysis gives the 
structural relations for the Figure. The interpretation of this cluster analysis, specially the 
synthesis from clusters according to the cluster tree, will be explained in the next section. 

The transposed 50/30-matrix, the 30/50-matrix, is also used in a cluster analysis, see page 
A 14. The Agents are here clustered with the 50-Objectives used as variables. This clustering 
is done exactly in the same way as described for the 50/30-matrix. Seven significant clusters 
are created in this analysis. This text is to some extent special as we find the same number of 
significant clusters for Objectives and for Agents. Normally, and particulariy for longer texts, 
we get more 0-clusters than A-clusters. The use of the result from the clustering of the 30/50- 
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matrix will be demonstrated in the next section. An important connection between the 
clustering of Objectives and Agents will also be discussed in the next section. 

The Ground matrix 60/30 and its transpose 30/60 are also clustered in exactly the same way 
as the 50/30- and 30/50-matrixes. The results from the 60/30-clustering are reported on page 
A 16, and the results from the 30/60-clustering on page A 18. In both cases four significant 
clusters are created. 



Naming of clusters and synthesis from clusters 

This is the last phases in Perspective Text Analysis. Here the user of PERTEX has to do the 
important part of the job, and the system is mainly used as an instrument for documentation 
and organization of text material. The basic idea is that all the preceding steps of analysis will 
now end up in a description, topological in nature, of the structural relations uncovered by the 
clustering of Objectives and the text producers perspective on the Objectives uncovered by 
the clustering of Agents. 

The seven significant clusters from the 50/30-analysis are reported on page A 13. Here the 
unique SO-Objective strings are printed for each cluster. The naming of clusters is now an 
important task for the user. Based on the text strings in the cluster a prototypic 
naming/description of the cluster must be created. This task can be more or less difficult. 
Different texts, and alternative numbers of significant clusters selected, will require a varying 
amount of intellectual effort in finding appropriate names for clusters. 

For cluster 1 with the unique 50-Objective strings 1, 2, 7, 12 and 15, the prototypic name is 
set to Preparedness. Cluster 1 is the cluster that collects the 'residual' items mentioned earlier! 
The name of such a cluster has to be relatively broad to sum up a general frame present in the 
text and expressed by the text strings in the cluster. Generally one or more synonyms can be 
found for naming of a cluster. The situation described in cluster 1 indicates different forms of 
threats. Without these threats no preparedness. Threats and preparedness can be seen as 
alternatives for characterising the general frame expressed in cluster 1. 

For a small text, like the actual one, some clusters only consist of a few text strings and the 
naming must eventually be based on one 'dominating' item. In cluster 2 item 4 'it' is of no help 
in naming the cluster. The proposed name Direction is easily found as an general expression 
for what is prototypic for item 3. Naming of the remaining five clusters arc found on page 
A13 and will not be discussed here in detail. 

It is important to notice that the clusters ai^e named independentiy of each other. The separate 
text strings for each cluster are presented on screen when PERTEX is used interactively for 
the naming task. Names for clusters can easily be changed on screen. 

So far the analysis has, in all the phases, from dictionary coding to /aaming of clusters, 
manipulated certain and mainly rather small parts of the text. Now we are ready, finally, to 
see how all these isolated manipulations will end up in a synthesis that grasp the intention of 
the entire text. This is practically possible mainly because of the unique AaO-paradigm used 
as a theoretical base. In a more technical sense, it is of great importance that, in the clustering 
process every fusion is based on calculations for all possible fusions. By this approach we arc 
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now prepared to see how a synthesis from the named clusters, according to the structure of 
the cluster tree, will uncover the structure of the intention the text presents. 

On page A13, the right part, PERTEX has printed a box-like version of the cluster tree. This 
box-tree is read ft*om top to bottom. The box at the top represents the well-known cluster 1 
from clustering the 50/30-matrix. To the left in this box-tree we find all the seven clusters 
from that analysis. The names given to the clusters are printed in the boxes. The number 
identification for each box is the cluster number before colon. After colon is marked the 
number for the fint item in the cluster. These numbers are useful as reference numbers 
between the box-tree, the naming of clusters and the ordinary cluster tree. 

When running the box-tree on screen for the first time only the cluster names are present. The 
final task is to follow the arrows indicating a flow, according to the structure of the fusions in 
the cluster tree, to build up a synthesis based on tiie cluster names. The first cluster 
Preparedness will be transformed by cluster 2 Direction. The result from the fusion when 
Preparedness is transformed by Direction is Determination. In the next step Determination is 
transformed in a new fusion by cluster 3 Regime. The result is formulated as Domination. So 
the synthesis process continues until the last box is filled with Safety after die transformation 
of Violation of Order by Constraint. 

In the Safety-box all the 50-Objective items, the Figure items, are assembled. Therefore we 
can say that die Figure described in the text can be characterized by Safety. A remarkable 
thing is tliat tiiis conclusion can be made without really reading the text. The result is based 
on a comprehensive, strict, formal and computerized, analysis and a synthesis based on 
certain elements from that analysis. To check this result it is of course necessary to read die 
text. When doing so it is a bit astonishing to notice how well the whole text can be 
characterized by Safety. What would be a better alternative? 

In the box-tree it is not only the final box that is of interest. The structural relations indicated 
by the arrows between the boxes demonstrate how, and from what, the Figure Safety is 
created. The process from cluster 1 to the final box is unique for every text. Dependent on die 
complexity in the text there can be one or more sub-trees developed. A sub-tree handles a 
separate 'line of thinking' in text, e. g. a question or a theme, that is elaborated to the extent 
that the cluster analysis is influenced. In the cluster tree it is possible to follow how such 
discussions are linked together via fusions. 

The syntiiesis from the 50-Objecrive clusters is only one pan of the results for the Figure 
component. The other pan is the text producers perspective on the Figure. This perspective is 
handled via clustering of the 30/50-matrix. The clusters of Agents are created according to 
the connection between die Agents and the Figure items. Every Agent cluster has its own 
specific collection of such connections to the Figure. Therefore, the Figure items connected to 
a Agent cluster is inteipreted as an indication of the perspective on the Figure. Here we must 
notice that we are not going to describe an Agent cluster as a group of unique Agents. Instead 
it is the Figure items, as variables, which motivate such a cluster, that are of interest. The 
clustering of Agents can be said to generate a number of 'agent positions' from which die 
Figure is 'observed*. This positions define the perspective for observing the Figure. By 
clustering the Agents connected to the Figure we uncover the structural relations for the text 
producers focus on the Figure. 
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The clustering of the 30/50-matrix is done in the same way as was described for the 50/30- 
matrix, see page A14. The interpretation of the Agent clusters is, however, already prepared 
by the naming of Figure clusters. On page A 15, middle of the page, is reported a matrix for 
connections between 50- and 30-clusters. This matrix is nothing but an aggregated form of 
the binary matrix from page A9. The aggregation is based on the clustering of both Figure 
items and Agent items. The size of the matrix is set by the number of Figure clusters and the 
number of Agent clusters, here 7x7. The numbers for identification of clusters discussed 
earlier is used as identification of rows (Figure clusters) and columns (Agent clusters). The 
cells in the matrix mark the number of connections between unique Figure items and unique 
Agent items on cluster level. Notice that the sum of these numbers of connections is the same 
as the sum of binary T- values in the matrix on page A9, and the same as the numbers of 
coordinates in MATRIX STATISTICS on page Al 1. 

By using the connections on cluster level we easily can handle the naming of Agent clusters 
by references to the coixesponding Figure cluster(s). In this example the connection matrix is 
a diagonal matrix. This means that there is an exact correspondence between clusters on item 
level. Therefore the naming of all the Agent clusters can be copied from the Figure clusters, 
see the right part on page Al 5. The box-tree for the perspective on Figure is just the same tree 
as the tree for Figure. In this text the perspective on the Figure is the same as the Figure. The 
text is not written to put some aspects in focus, all aspects are emphasized at the same level. 

The diagonal form of the connection matrix is often found for shorter texts and texts 
primarily reporting facts. When a text has more of discussion or argumentation we normally 
get more Figure clusters than Agent clusters. In that case the connection matrix can not be a 
diagonal matrix. Instead the matrix usually describes how a number of the first Figure 
clusters are all connected to the first Agent cluster. This means that the process in the Figui-e 
tree, from cluster 1 to cluster x, is concentrated in one Agent cluster. It is the point reached so 
far in the Figure tree that is in focus and will be taken as name for the first Agent cluster. 
From this starting point for the perspective on the Figure the remaining Agent clusters are 
handled according to the connection matrix. In some cases the elaboration of the perspective 
on the Figure will imply that the perspective represents a transformation of the direction in a 
part of the process in the Figure tree. This means that in the perspective. Figure clusters can 
be picked up and transformed in a process that is not only a compressed copy of the Figure 
process. 

The perspective on the Figure is a unique quality in Perspective Text Analysis. The use of the 
connection matrix is a key instrument for uncovering the structural relation in the perspective. 
So far PERTEX has no automatic routines for the use of the connection matrix. The matrix is 
automatically produced and can also be used to reproduce the binary matrix for connection 
between unique -Figure items and unique Agent -items. If the number of Figure clusters 
selected is set to the number of unique Figure items, and the same is done for unique Agent 
items, then the connection matrix will reproduce the binary matrix. This new matrix has, 
however, the rows and columns organized according to the specific sequence of items 
received from the organization of the cluster tree. The pattern of U'-values in the matrix will 
be more like a diagonal than in the original binary matrix. Such a matrix can be useful for 
experiments and special studies of combined Figure and Agent clustering. 

The Ground, Mean and Goal components are technically handled in PERTEX in the same 
way as has been described for the Figure. The result for naming of Ground clusters and the 
synthesis from these clusters are reponed on page A 17 and the perspective on page A 19. 

ErJc 1£/ 
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These results do not require any technical explanations as the results do not introduce any 
new complications in using PERTEX or understanding Perspective Text Analysis. 



Summary 

This paper has demonstrated the main steps in Perspective Text Analysis when using 
PERTEX. As has been demonstrated, this kind of new and unique text analysis is not one 
single type of analysis. PERTEX is built to cover all the different kinds of analysis, and 
synthesis, involved. The technical output from the system is illustrated in appendix with the 
complete printer output from analysis of a classic text. All information in this printer output is 
also available on screen during the interactive use of PERTEX. 

The different phases in the analysis, as discussed for the example in appendix, are easily 
handled in the interactive use of PERTEX. The process described for analysis of a text can 
be stopped and restaned at any stage. The strict control of syntax prevents the user to leave 
one level of analysis before all errors are eliminated. Special indicators on screen mark the 
status of the analysis for every text, A lot of parameters are automatically set for this purpose. 
By other parameters the user has a lot of options to design his personal use of PERTEX. By 
using PERTEX it is now possible to practise Perspective Text Analysis on a text to get a quite 
unique insight into the structural relations of the mentality the text presents. 
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APPENDIX: PRINTER OUTPUT FROM PERTEX 



The appendix is organized in the following sections: I 

TEXT A3 i 

CODING A5 

MATRIXES A9 

STATISTICS All 

FIGUR E A 1 2 

PERSPECTIVE ON FIGURE A14 

GROUND A16 

PERSPECTIVE ON GROUND A18 
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The content of this appendix is a complete printer output of an analysis via PERTEX. The 
text selected for illustration of output from PERTEX is an English translation of Tacitus' 
classical text on the Suiones, Chapter 44, sections two and three, Hutton(l958): 

Beyond these tribes the states of the Suiones, not on, but in, the ocean, possess tiot 
merely arms and men but powerful fleets: the style of their ships differs in this 
respect, that there is a prow at each end, with a beak ready to be driven forwards; 
they neither work it with sails, nor add oars in banls to the side: the gearing of 
the oars is detached as on certain rivers, and reversible as occasion demands, for 
movement in either direction. Among these peoples, further, respect is paid to 
wealth, and one man is accordingly supreme, with no restrictions and with an 
unchallenged right to obedience; nor is there any general carrying of arms here, 
as among the other Germans: rather they are locked up in charge of a warder, 
and that warder a slave. The ocean forbids sudden inroads from enemies; and, 
besides, bands of armed men, with nothing to do, easily become riotous: it is not 
to the king's interest to put a noble or a freeman or even a freedman in charge of 
the arms. 

The purpose of using this text here is not to discuss historical or other aspects of the Suiones 
or Tacitus' text compared to other texts. The text is only used as an illustration of how a text 
is processed by PER'fEX in Perspective Text Analysis. In a fonhcoming article Bemhard 
Bierschenk will discuss the outcome from PERTEX-analysis of Tacitus' Latin text as well as 
analysis of translations of the text into five other languages. 

During the interactive use of PERTEX you have the possibility to select just that printer 
output you want to have. All the information in printer output is also available on the PC- 
screen, 
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TEXT EDITED IN PERTEX 
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